The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools we’ve shown you in class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online.
http://phdcomics.com/ Comic posted 1-17-2018
The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)
hint: go to the PhD Comics website to see if you can find the image above
If you can’t find that exact image, just find a comparable image from the PhD Comics website and include it in your markdown
Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).
Perhaps you’re already really confused by the whole markdown thing. Maybe you’re so confused that you’ve forgotten how to add. Never fear! A calculator R is here:
1231521+12341556280987
## [1] 1.234156e+13
Or maybe, after you’ve added those numbers, you feel like it’s about time for a table!
I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). It’s not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in the future.
library(knitr)
kable(summary(cars), caption="I made this table with kable in the knitr package library")
| speed | dist | |
|---|---|---|
| Min. : 4.0 | Min. : 2.00 | |
| 1st Qu.:12.0 | 1st Qu.: 26.00 | |
| Median :15.0 | Median : 36.00 | |
| Mean :15.4 | Mean : 42.98 | |
| 3rd Qu.:19.0 | 3rd Qu.: 56.00 | |
| Max. :25.0 | Max. :120.00 |
And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh? Here’s ours! Include a fun gif of your choice!
Such snow. Very fun. Wow.
library(tidyverse)
## -- Attaching packages ---------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 2.2.1 v purrr 0.2.4
## v tibble 1.4.1 v dplyr 0.7.4
## v tidyr 0.7.2 v stringr 1.2.0
## v readr 1.1.1 v forcats 0.2.0
## -- Conflicts ------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(phyloseq)
metadata <- read.delim("Saanich.metadata.txt")
OTU <- read.delim("Saanich.OTU.txt")
load("phyloseq_object.RData")
ggplot(metadata, aes(x=NO3_uM, y=Depth_m))+
geom_point(color = "purple", shape=17, size = 2)
#Change from °C to °F
metadata %>%
mutate(Temperature_F = (Temperature_C*9/5)+32) %>%
select(Temperature_F, Depth_m) %>%
#Plot data
ggplot(., aes(x=Temperature_F, y=Depth_m))+
geom_point()
#convert physeq data to % and plot
physeq_percent = transform_sample_counts(physeq, function(x) 100*x/sum(x))
plot_bar(physeq_percent, fill="Phylum")+
geom_bar(aes(fill=Phylum), stat = "identity")+
#Add title and axis labels
ggtitle("Phyla from 10 to 200 m in Saanich Inlet")+
xlab("Sample depth") + ylab("Percent relative abundance")
#Gather data in 1 column
metadata_nutrients <- metadata %>%
gather(Nutrient, Concentration, O2_uM, PO4_uM, SiO2_uM, NO3_uM, NH4_uM, NO2_uM)%>%
select(Depth_m, Nutrient, Concentration)
#Plot facet plot
ggplot(metadata_nutrients, aes(x=Depth_m, y=Concentration))+
geom_line(color="grey")+
geom_point()+
facet_wrap(~Nutrient, scales="free_y")+
theme(legend.position = "none")+
ggtitle("Nutrient distribution across a depth of 10-200m in Saanich Inlet")+
xlab("Depth (m)")+
ylab("Nutrient Concentration (uM)")
This paper is a review that explained various geochemical signatures and trends during the Holocene with other literature, and the Anthropocene. It appears from the original paper that there were several different measures that were analyzed. Some measures include taking lake sediment cores to measure for constituents from anthropogenic materials of that sediment, glacier ice cores to measure atmospheric CO2, measuring sea level increases. There are also other key anthropogenic change markers such as temperature anomaly, methane as well as human made materials like concrete, plastic and black carbon from fossil fuel emissions.
As technological advances occur through time, human population growth increases, leading to a corresponding increase in resource usage (fossil fuels, metals, fertilizers, land use change), leading to an ecological change. As show in the original paper, there is a considerable increase in ice core nitrate levels, atmospheric CO2, methane, temperature, cement, plastic, black carbon, which can all be attributed to humans.
Human wastes and byproducts of production has caused the formation of persistent materials in the world, which can act as a geochemical signature. Plastics produced by humans have started accumulating in both shallow and deep water sediments which can form biological and/or chemical interactions.
Radioactive materials have also started appearing due to the testing and deployment of nuclear arms by the USA and Russia. This created an increase on 239Pu (Plutonium) from the fall out and can accumulate in the sediments, creating another signature of the Anthropocene given its long half life.
There has been a relatively large proportion of animals going extinct, due to intensive human land and resource usage.
Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.
The primrary prokaryotic habitats on Earth are aquatic environments (both fresh water and marine) across various regions in the open ocean and sediments, which has a total prokaryotic cell abundance of 1.2 x 1029, soil, which has a total prokaryotic cell abundance of 2.6 x 1029, and subsurface, which has a total prokaryotic cell abundance of 4.9 x 1030.
The estimated cell abundance of prokaryotic cells in the upper 200m of the ocean is 3.6 x 1028 and marine autotrophs make up 8% of those prokaryotic cells. This is significant as the autotrophs control the majority of the primary production in the ocean and provides oxygen to the atmosphere and carbon to marine heterotrophic bacteria.
Autotrophs are “self-nourishing” organisms that can fix inorganic carbon into biomass
Heterotrophs are organisms that can assimilate orgnaic carbon
Lithotrophs are organisms that can use inorganic substrates
The deepest habitat capable of supporting prokaryotic life is in the terrestrial subsurface where it can reach depths up to 4 km, as mentioned in the text. For marine environments, the deepest environment is the Marianas Trench, which is around 10.9 km deep, and possibly 5-6 km deeper before the temperature reaches the limiting factor of having an average temperature of 125 degrees celsius.
The highest habitat capable of supporting prokaryotic life is at 57-77 km in the atmosphere, but the abundance is relatively low because the oxygne concentration at that height is relatively low. There could also be a low amount of water vapour, which maybe needed by those prokaryotes.
The vertical distance of the Earth’s biosphere is measured to be around 57-77 km in the atmisohere due to prokaryotes being detected at those heights.
It was determined by dividing the number of days in the year (365) by the turnover time and then multiplying it by the population size.
Example (for Marine Heterotrophs living above 200m ; Turnover time = 16 days; Population size = 3.6 x 1028):
\[\frac{365 \ days/year}{16 \ days} \cdot{3.6\times 10^{28}} \ Cells = 8.2\times 10^{29} \ Cells/year\]
A large number of these prokaryotes are lysed by marine viruses.
Carbon efficiency = 20%
5-20 fg C/cell
\[3.6\times 10^{28} \ cells \cdot{6\times 10^{-30}} \ Pg \ C /cell = 0.72 \ Pg \ C\] Pg C in Marine Environments
\[0.72 \ Pg \ C\times{4} = 2.88 \ Pg \ C\]
Tectonic movement along with photochemical reactions in the atmosphere provide a flow of substrates to organisms and can remove metabolic products through large scale processes, creating geochemical cycles.This allows elements and molecules to interact, forming bonds that can be formed and broken in a cyclical fashion. Abiotic processes generally include the acid/base chemistry (proton transfer), the geophysical processes that bring about chemical reactions. Biotic processes are brought about by the photochemical reactions, which drive primary production and also the cycling of nutrients such as N and P. This primary production can support heterotrophic bacteria that can respire organic matter and remineralize inorganic substrates that contain N, S, and P. These biotic processes also contribute to oxygen production and depletion.
Biogeochemical: (Redox)
The Earth’s redox state is considered an emergent property as it is reliant on reactions that occur in the Earth’s (bio)geochemical processes that occur due to regular chemical redox reactions and the metabolic processes of microbes. No single organism in these processes possess all the components that is needed to run these systems, and it is instead the reactions that occur, which are connected to each other that make up the processes as a whole.
Reversible electron transfer reactions give rise to new element and nutrient cycles as these cycles would have been almost impossible given the thermodynamic circumstances that can be corrected by this process. More so, the thermodynamic conditions in the environment can influence the rate of each reaction, determining whether it is spontaneous or not (ex: abundance of substrates, products, etc.).
The different stages of the nitrogen cycle all require different amounts of oxygen , corresponding to different redox niches, and the other microbes that will be in that environment. In nitrogen fixation, nitrogen gas is fixed from our atmosphere into ammonium through the use of nitrogenase, which must be protected from oxygen. Another source of fixed nitrogen is through the oxidation of ammonium to nitrite and finally nitrate in aerobic environments and provides nitrates to phytoplankton. On the other hand, denitrification, the anaerobic oxidation of nitrate and nitrite to nitrogen gas occurs in oxygen minimum zones in the ocean.
The nitrogen cycle is indirectly connected to global warming. All microbes require nitrogen to synthesize protein and nucleic acids as they contain amino acids and nitrogenous bases respectively. A major mode of N2 fixation is through microbes, although the Haber-Bosch process contributes to a significant portion as well. The nitrogen cycle is what controls the amount of available fixed nitrogen to organisms, which can influence the abundance of microbes how how they affect the atmosphere, especially through photosynthesis. A shift in the nitrogen cycle could have big implications on not just the microbes, but on the community itself and the overall climate, albeit not directly since it only deals with nitrogen and hydrogen gas, which are emitted through nitrogen cycle processes.
Although genes may mutate fairly often, there will always be a set of core genes that remain and are protected since they are extremely important. These genes code for the major redox reactions essential for life and biogeochemical cycles. Therefore, microbial diversity with new genes that pop up does not necessarily mean there is diversity in metabolic protein since they can be crowded out by those genes that have been conserved.
Given that there are almost limitless ways to mutate genes, this leads to the hypothesis of having limitless evolutionary diversity in nature. This coupled with the fact that the rate of discovery of unique protein families has been proportional to the sampling effort, means that our sampling effort is the limiting factor and the rate at which we discover new protein families scales approximately linearly with the number of new genomes sequenced.
Prompt: “Microbial life can easily live without us; we, however, cannot survive without the global catalysis and environmental it provides.”Do you agree or disagree with this statement? Answer the question using specific reference to your reading discussions and content from evidence worksheets and problem sets.
As microbial life has existed for at least 3.5 billion years, based on undisputed evidence, they have inhabited the Earth for a significant portion of its 4.6-billion year life, shaping many biogeochemical processes over relatively long time scales (1). Microbes cannot be seen with the naked eye, but their sheer abundance, spanning four orders of magnitude from 1026 to 1030 cells in a given marine or terrestrial habitat, means that they contribute significantly to the large scale biogeochemical cycles on Earth, some of which still exists today in some form (2). With the large abundance in microbial cells in these habitats also comes great genetic diversity, which has evolved as genes are conserved and disappear over the billions of years that microbes have existed, as changes in the Earthâs systems happen over time. Humans have inhabited the Earth at a point in time that is relatively very recent on the geologic time scale compared to the emergence of microbes, with the large-scale changes in the Earth’s systems coming at an even more recent time of around 200 years ago, in a period that has been labelled as the “Anthropocene” (3). As microbes have continuously shaped the biogeochemical landscape of the Earth in the past 3.5 billion years, and still play an integral role in the biogeochemical pathways that still exist and effect the Earth’s atmosphere, land and oceans, it could be said that us humans rely on microbes significantly more than microbes rely on us for survival.
Prior the existence of microbial life appearing about 3.5 billion years ago, the Earth was constantly bombarded by meteorites, which ejects rocks into the atmosphere, and vaporizes the water (1). The resulting water vapour created a greenhouse effect on Earth, keeping the temperatures high at a time when the sun was about 25-30% dimmer, with a water temperature of around 100 degrees C (1). Although this temperature is too high for most forms of life to live, there could possibly be thermophilic prokaryotes that could survive under these conditions. Furthermore, there could be tectonic activity as well, ejecting magma that could acidify the already hot water, further making it more uninhabitable by most organisms (1). These organisms that survive were integral in shaping the biogeochemical landscape of the early Earth through their redox reactions and have laid the ground work for future shifts changes in the Earthâs atmosphere and landscape.
Based on evidence from the Isua Belt in Greenland, the atmosphere could have also consisted of carbon dioxide (CO2) based on the inorganic carbon (carbonate) being formed on water-lain sedimentary rocks (1). Given that the Sun was dimmer, meaning less light reaching the Earth and therefore, cooler temperatures, the potent greenhouse gas produced by methanogenic prokaryotes (methanogens) could have trapped enough heat to support life (4). The anoxic CO2 atmosphere provided the inorganic carbon to be reduced by hydrogen gas (H2), yielding methane, as catalyzed by enzymes produced by methanogens (4). However, the rise in temperature from the greenhouse effect of methane cannot happen indefinitely as there is a negative feedback mechanism, where by an organic haze, produced when methane far exceeds CO2 in the atmosphere (4). This reduces the sunlight that gets through to the Earth’s surface, and can cool down the Earth. This CO2 could also be used by 1,5-bisphosphate carboxylase-oxygenase (Rubisco), an enzyme used to fix inorganic carbon, specifically CO2 in a process known as anoxygenic photosynthesis (1, 5). These processes that take up inorganic carbon leave behind carbonate, which is precipitated out of solution.
The gradual loss of H2 lead to a less reducing environment, allowing for the proliferation of a more oxidizing atmosphere as less methane is produced and reverse reaction of methanogenesis is favoured, resulting in an increase in CO2 (4). The evolution of cyanobacteria coincided with this around 2.7 billion years ago, where the oxygen content of the atmosphere increased (1). As more oxygen appeared, so did more complex eukaryotes, which have incorporated their prokaryotic symbionts as mitochondria and chloroplasts (4-Kasting). About 500 million years after, the Great Oxygenation Event happened, drastically increasing the oxygen content in the atmosphere to one that is similar to today’s atmosphere (4). As more CO2 is now in the atmosphere, Rubisco became more prevalent in the fixation of inorganic carbon and due to itâs preference to take up CO2, there is now more oxygen relative to CO2. This photosynthetic pathway has survived for a couple billion years and the microbial fraction of photosynthesis accounts for a sizeable proportion of the oxygen produced, especially from marine sources (4).
The supply of fixed nitrogen has also changed significantly since the Earth was formed, where nitrogen gas (N2) was initially fixed by relatively slow processes such as lightning fixation (1). As microbes started existing, they started deriving energy from redox reactions, providing selective pressure in favour of those organisms that could fix the very abundant N2 using reducing substrates under anaerobic conditions at the time (5). However, the Earth became more oxygenated over time and the enzyme involved, nitrogenase, is inhibited (5). This then led to greater selective pressure on nitrogen-fixing cyanobacteria, causing them to evolve ways to protect nitrogenase from oxygen (1). This fixed nitrogen in the form of nitrate (NO3-) or ammonium (NH4+) can be used by organisms to be incorporated into nucleic acids and proteins, which are essential for growth and life.
However, as this is a cycle, there are also processes that result in the loss of fixed nitrogen. as nitrogen gas (N2) or nitrous oxide (N2O), whether it be through denitrification, or annamox catalyzed by microbes (5). As these processes are anaerobic, they usually happen in the absence of oxygen, they happen in zones in the ocean called Oxygen Minimum Zones (OMZs) (6). This causes the fixed nitrogen to be lost to the atmosphere as N2. Meanwhile, the N2O released into the atmosphere acts as a greenhouse gas, further intensifying climate change. This leads to further increases in the atmospheric CO2 and temperature, reducing the solubility of oxygen and therefore dissolved oxygen concentration in the ocean, causing the expansion of these OMZs. As more fixed nitrogen is lost, the phytoplankton responsible for a significant portion of the primary production in the ocean and world will become even more nutrient limited. As there is less oxygen from primary productivity, this further exacerbates climate change in the process (7). It is unsure if these processes have evolved over time, but since these processes must start from a more oxidized form of nitrogen, such as NO3- or nitrite (NO2-), they would have evolved after the Earth was significantly more oxygenated 2.3 billion years ago (5).
Microbial cells are microscopic in size and only contain a relatively very small amount of carbon in each cell, but they are very abundant in the various habitats on Earth. As microbial abundance is high, there are more individuals that participate in these biogeochemical processes, significantly increasing the scale and output of these processes, making them viable over larger spaces and time scales. With larger spaces, also comes a wide variety of environments such as soil and the oceans that can vary in microbial abundance with different conditions that creates different roles and pathways. Marine environments typically have around 1.5 x 1029 prokaryotic cells and typically contribute a little over 50% of the world’s primary productivity at 51 Pg of carbon per year, largely due to subsurface photosynthetic and heterotrophic prokaryotes (2). Soil has almost double the number of prokaryotic cells at 2.6 x 1029, and they contribute to the break down of organic matter through decomposition, recycling nutrients for other living organisms to grow off (2).
Although the subsurface microbial abundance (both marine and terrestrial subsurface) is hard to measure given the difficulty with collecting samples, it has been estimated that the subsurface microbial abundance is of a magnitude greater than both marine and soil prokaryotes combined and these are mostly anaerobic bacteria that live in the sediments under the sea or terrestrial surface that reduce inorganic carbon to methane (CH4) or sulfate to hydrogen sulfide (H2S) (2). These pathways have evolved from several billion years ago when methanogenesis was common in the anaerobic atmosphere, and appears to have been conserved (1).
Given that microbes have been on Earth for the past 3.5 billion years or longer, today’s metabolic pathways that microbes partake in have survived the several billions of years of evolutionary pressures that arise along the way in a changing biogeochemical landscape. These evolutionary pressures have come in the way of larger scale changes, like the Great Oxygenation Event, as well as smaller scale perturbations that both contribute to refining these microbial pathways through selection by horizontal and vertical gene transfer, making microbes “guardians of metabolism”â" (8).
On the other hand, the modern human having only been around for around 200,000 years (3, 9), a relatively very small amount of time on the geologic time scale. We have largely depended on microbes to shape the Earth the way it has for the past several billion years, including the biogeochemical cycles and landscape that we take for granted through the microbial control of electron flow between reducing (8). Although microbes do play a role in climate change, it largely depends on anthropogenic activities that release increasingly large amounts of CO2 into the atmosphere that the biogeochemical cycling of microbes could partially mitigate through photosynthesis and increased nutrient availability. However, it is the increase in CO2 also diminishes the ability of microbes to cycle the nutrients that photosynthetic organisms need to produce oxygen and biomass.
In the past 200 years, humans have accelerated the biogeochemical changes on Earth, starting with the Industrial Revolution. This time period is referred to as the “Anthropocene” and it corresponds to a time where atmospheric CO2 and other greenhouse gases have risen significantly due to anthropogenic activities through the burning of fossil fuels and deforestation (9). During this time, the human population also increased significantly, and therefore resource usage (3). With this drastic increase in CO2 levels over a relatively short time scale, the carbon cycle has been disrupted and the microbes probably have not had enough time to fully refine their pathways to mitigate the effects of this.
Although humans have developed a process to fix nitrogen industrially, called the Haber-Bosch Process, this only provides fertilizer to agriculture in the form of ammonium, which can be nitrified by bacteria into NO3¬- (5). Although this process does not require the catalysis of microbes, the downstream effects in the oceans do involve them, causing eutrophication from the nitrate that reaches the oceans and creating larger OMZs in the oceans due to higher decomposition of more phytoplankton (5). This reduces the amount of fixed nitrogen available due to the imbalance between nitrification and denitrification in these OMZs and would end up reducing primary production due to nitrogen limitation. Furthermore, N2O is also produced in the process and is released into the atmosphere, acting as a greenhouse gas that can worsen the effects of climate change (5).
Overall, microbes have a sizeable impact on the biogeochemical landscape of the Earth due to the relatively long time they have inhabited and shaped the Earth, as well as their high abundances in different environments coming together to increase the scale of their pathways. They directly influence the nutrient availability for other organisms such as humans, through nitrogen fixation and photosynthesis in pathways that have been refined over their several billion years of existence. As humans started appearing relatively recently on the geological time scale at 200,000 years ago (9), combined with the fact that we have only been changing the biogeochemical landscape for the past 200 years (3), microbes could live without us given their age, continuing with their biogeochemical processes as normal. However, given the rate at which technological advances are occurring in the Anthropocene, could there some way we could greatly reduce our reliance on microbes in the future and shape the biogeochemical landscape of the Earth over a shorter, more viable time scale given that a lot of processes catalyzed by microbes are of a larger scale?
References:
Whitman WB, Coleman DC, Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci USA. 95(12):6578-6583.
PMC33863
Kasting JF, Siefert JL. 2002. Life and the Evolution of Earth’s Atmosphere. Science. 296(5570):1066-1068. 10.1126/science.1071184
Nisbet EG, Sleep NH. 2001. The habitat and nature of early life. Nature. 409(6823):1083-1091. 10.1038/35059210
Falkowski PG, Fenchel T, Delong EF. 2008. The Microbial Engines That Drive Earth’s Biogeochemical Cycles. Science. 320(5879):1034-1039. 10.1126/science.1153213
Waters CN, Zalasiewicz J, Summerhayes C, Barnosky AD, Poirier C. Gluszka, Cearreta A, Edgeworth M, Ellis EC, Ellis M, Jeandel C, Leinfelder R, McNeill JR, Richter DD, Steffen W, Syvitski J, Vidas D, Wagreich M, Williams M, An Z, Grinevald J, Odada E, Oreskes N, Wolfe AP. 2016. The Anthropocene is functionally and stratigraphically distinct from the Holocene. Science. 351(6269):aad2622-1-2622-10. 10.1126/science.aad2622
Proteorhodopsin needs retinal to function, but does not need an exogenous pigment to be added when added to a heterologous host since synthesizing retinal requires its precursor, ??-carotene and that produces the orange/red colour.
Proteorhodopsin can be transferred vertically and can be well conserved, but can easily be transferred through horizontal gene transfer since there only being 7 genes needing to be transferred. These genes are required to catalyze the biosynthesis of retinal from β-carotene.
There is a fitness benefit from this since they can pump protons, creating a proton gradient to produce more ATP in the presence of light.
Specific emphasis should be placed on the process used to find the answer. Be as comprehensive as possible e.g. provide URLs for web sources, literature citations, etc.
2016: * 89 Bacterial Phyla * 20 Archaeal Phyla * Phyla was found using the 16S rRNA databases * Could be up to 1500 due to microbes that live in the “Shadow Biospere”
2003: * 26 of 52 major bacterial phyla have been cultured
Metagenomic sequence binning is the proces from grouping sequences that come from a single genome. * Algorithm types * Align genome to database (BWA) * Group to bins based on DNA characteristics based on DNA characteristics: GC Content, Codon usage * ** Risks and opportunities in binning**
Martinez A, Bradley AS, Waldbauer JR, Summons RE, DeLong EF. 2007. Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. Proc Nat Acad Sci U S A. 104(13):5590-5595. 10.1073/pnas.0611470104
Wooley JC, Godzik A, Friedberg I. 2009. A Primer on Metagenomics. PLoS Comput Biol. 6(2):e1000667. 10.1371/journal.pcbi.1000667
library(phyloseq)
library(knitr)
library(kableExtra)
library(tidyverse)
#Enter in subsample data
subsample_bag3 = data.frame(
Species_number = c(seq(1,32, by=1)),
Species_type = c("MMs","MMs","MMs","MMs","MMs","MMs",
"Skittles","Skittles","Skittles","Skittles",
"Skittles","Gummy_Bears","Gummy_Bears","Gummy_Bears",
"Gummy_Bears","Gummy_Bears","Gummy_Bears","Rods","Rods",
"Rods","Rods","Rods","Circular","Filamentous",
"Filamentous","Filamentous", "Brick","Brick",
"Round","Round","Round","Round"),
Species_name = c("Orange", "Yellow","Green","Red", "Brown","Blue",
"Yellow","Orange","Red","Brown","Green",
"Orange", "Pink","Yellow","White","Red","Green",
"Yellow","Orange","Pink","Green","Red",
"Blue_Swirl", "String_like","Spider",
"Coke_Bottle","Big_blue","Small_blue","Green", "Yellow",
"Red","Purple"),
Abundance = c(13,8,4,3,4,12,6,7,4,10,8,1,3,2,4,3,4,5,8,5,2,4,
1,1,1,1,1,1,1,1,1,1)
)
subsample_bag3 %>%
kable("html") %>%
kable_styling(bootstrap_options = "striped", font_size = 10,
full_width = F)
| Species_number | Species_type | Species_name | Abundance |
|---|---|---|---|
| 1 | MMs | Orange | 13 |
| 2 | MMs | Yellow | 8 |
| 3 | MMs | Green | 4 |
| 4 | MMs | Red | 3 |
| 5 | MMs | Brown | 4 |
| 6 | MMs | Blue | 12 |
| 7 | Skittles | Yellow | 6 |
| 8 | Skittles | Orange | 7 |
| 9 | Skittles | Red | 4 |
| 10 | Skittles | Brown | 10 |
| 11 | Skittles | Green | 8 |
| 12 | Gummy_Bears | Orange | 1 |
| 13 | Gummy_Bears | Pink | 3 |
| 14 | Gummy_Bears | Yellow | 2 |
| 15 | Gummy_Bears | White | 4 |
| 16 | Gummy_Bears | Red | 3 |
| 17 | Gummy_Bears | Green | 4 |
| 18 | Rods | Yellow | 5 |
| 19 | Rods | Orange | 8 |
| 20 | Rods | Pink | 5 |
| 21 | Rods | Green | 2 |
| 22 | Rods | Red | 4 |
| 23 | Circular | Blue_Swirl | 1 |
| 24 | Filamentous | String_like | 1 |
| 25 | Filamentous | Spider | 1 |
| 26 | Filamentous | Coke_Bottle | 1 |
| 27 | Brick | Big_blue | 1 |
| 28 | Brick | Small_blue | 1 |
| 29 | Round | Green | 1 |
| 30 | Round | Yellow | 1 |
| 31 | Round | Red | 1 |
| 32 | Round | Purple | 1 |
collectors_curve = data.frame(
x = c(seq(1, 130,by = 1)),
y = c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,
3,3,3,3,4,4,4,5,5,5,5,6,6,6,6,6,6,6,6,6,6,
6,6,7,7,7,7,7,7,8,8,8,8,8,8,8,9,9,9,9,10,10,10,
10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,12,
13,13,13,14,14,15,15,15,15,16,16,16,17,17,17,17,
18,18,18,18,18,19,19,19,19,19,19,19,19,20,20,20,20,
20,21,21,22,22,22,22,23,24,25,26,27,28,29,30,31,32)
)
ggplot(collectors_curve, aes(x = x, y = y)) +
geom_point() +
geom_smooth()+
labs(x="Cumulative number of individuals classified",
y="Cumulative number of species observed", title = "Collector's Curve of a subsample of Bag #3")
## `geom_smooth()` using method = 'loess'
#Invsimpson for M&M species
1/((13/44)^2+(8/44)^2+(4/44)^2+(3/44)^2+(4/44)^2+(12/44)^2)
## [1] 4.631579
#Invsimpson for Skittle species
1/((6/35)^2+(7/35)^2+(4/35)^2+(10/35)^2+(8/35)^2)
## [1] 4.622642
#Invsimpson for Gummy Bears species
1/((1/17)^2+(3/17)^2+(2/17)^2+(4/17)^2+(3/17)^2+(4/17)^2)
## [1] 5.254545
#Invsimpson for Rod species
1/((5/24)^2+(8/24)^2+(5/24)^2+(2/24)^2+(4/24)^2)
## [1] 4.298507
#Invsimpson for Circular species
1/((1/1)^2)
## [1] 1
#Invsimpson for Filamentous species
1/((1/3)^2+(1/3)^2+(1/3)^2)
## [1] 3
#Invsimpson for Brick species
1/((1/2)^2+(1/2)^2)
## [1] 2
#Invsimpson for Round species
1/((1/4)^2+(1/4)^2+(1/4)^2+(1/4)^2)
## [1] 4
#Calculate inverse simpson for whole sample
1/((13/130)^2+(8/130)^2+(4/130)^2+(3/130)^2+
(4/130)^2+(12/130)^2+(6/130)^2+(7/130)^2+
(4/130)^2+(10/130)^2+(8/130)^2+(1/130)^2+
(3/130)^2+(2/130)^2+(4/130)^2+(3/130)^2+(4/130)^2+
(5/130)^2+(8/130)^2+(5/130)^2+(2/130)^2+(4/130)^2+
(1/130)^2+(1/130)^2+(1/130)^2+(1/130)^2+(1/130)^2+
(1/130)^2+(1/130)^2+(1/130)^2+(1/130)^2)
## [1] 19.18275
## Calculate Chao1 - Singletons = 11; Species observed >2 times= 21
32 + ((11^2)/(2*21))
## [1] 34.88095
#Calculate diversity using Vegan
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.4-6
subsample_bag3_diversity =
subsample_bag3 %>%
select(Species_number, Abundance) %>%
spread(Species_number, Abundance)
#Subsample inverse simpson using Vegan
diversity(subsample_bag3_diversity, index ="invsimpson")
## [1] 19.161
#Subsample chao1 using Vegan
specpool(subsample_bag3_diversity)
## Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All 32 32 0 32 0 32 32 0 1
These calculations are different from the previous calculations by 0.02 for inverse simpson and 2.88 for chao1.
The measure of diversity depends on the definition of species in the sample since, especially chao1 since the number of species can influence the observed number of species, along with the number of singletons in the sample.
Removing the colour separation would significantly decrease the number of species, combining the skittles and M&Ms would also decrease the number of species.
Different sequencing technologies can introduce biases that can overrepresent one species relative to another. Also, there could be varying error rates in different sequencing technologies, which can affect the categorization of species, affecting the number of species.
Prompt: Discuss the challenges involved in defining a microbial species and how HGT complicates matters, especially in the context of the evolution and phylogenetic distribution of microbial metabolic pathways. Can you comment on how HGT influences the maintenance of global biogeochemical cycles through time? Finally, do you think it is necessary to have a clear definition of a microbial species? Why or why not?
Defining a species at the microbial level is difficult given their relatively small genome size, which can range from a mere 0.2 to a significantly larger 13.0 megabases, depending on the microbe’s lifestyle (1, 2). Using E. coli as an example, the sequence similarity required to be classified the same species has been shown to vary greatly, complicating the definition of a species (3) Horizontal Gene Transfer can also complicate these phylogenetic relationships between organisms, creating incongruent relationships, whereby there are no partitions/taxonomic groups in common (4). As microbial genome sizes are small, they are under evolutionary pressure to become more streamlined, losing genes in the process (5) However, these genes that are selectively lost from the organism are not lost from the community, as they can be transferred to other organisms via Mobile Genetic Elements (MGE) (6). Gene composition changes in microbial genomes through Horizontal Gene Transfer can accumulate over time with changing environmental evolutionary pressures over time, which can change the biogeochemical landscape of ecosystems as specific pathway genes are gained and lost (7.).
Horizontal Gene Transfer is the driver of microbial evolution, where it can alter the relatively small microbial genome of a certain species, leading to microbial strains of different lengths and consisting of drastically different genes (3, 4). As these genomes can potentially be quite distinct from each other in the pathways that are expressed, there is the challenge of having a clear definition of a microbial species where one strain can be very different functionally. For example, in Welch et al., 2002, the sequence similarity between the three E. coli strains sequenced in the study was found to be 39.2%, which is significantly lower than the 99.5% sequence similarity between human genomes (8). In the context of E. coli, each strain can contain different pathogenicity islands in their genomes that can alter their disease potential, and also their host range based on selective pressures in the environment (3). In addition, antibiotic resistance provides another avenue of selective pressure on E. coli strains, where only the resistant bacteria will survive in the presence of a certain antibiotic that they are resistant to. These antibiotic resistance genes primarily come from Horizontal Gene Transfer, through which it can be rapidly acquired as it would be required for survival when the antibiotic is present (9).
However, in the context of biogeochemical cycling, the selective pressures lean towards the energetic and nutritional requirements of microbes that can change under stress from environmental perturbations, where genes are transferred to other microbes in the community based on which pathways are favourable in a given environment (9). These pathways can be distributed across a wide range of microbes as they were drawn from a primitive global gene pool (9). The influence imposed on genes by the environment also favours certain genes that are present across different microbial species, where the favourable genes are transferred throughout the community, such as the transfer of methanogenesis, sulfur reduction, and (de)nitrification genes (9). Furthermore, if the certain pathway continues to be functionally and energetically favourable in the environment due to a lack of biogeochemical changes in the environment, then it’s retention would be also be governed by environmental selective pressures that could affect the survival of members within a microbial community. This could differentially distribute biogeochemical pathways between communities, such as in the oceans where nutrient and oxygen concentrations vary with depth (10, 11). Similarly, different microhabitats exist within hydrothermal vent communities, where the distance from the vent can drastically change the biogeochemical pathways as the environment becomes less harsh and reducing (11, 12).
Horizontal Gene Transfer has been suspected due to changes in codon usage frequencies and drastically different genes were found in different pathogenicity islands that are inserted at certain regions in the genome (3). As genes of different lengths are inserted, this can affect genome length differently, depending on whether it is a single gene, or an entire pathway that is inserted or substituted, which can reduce the similarity between the different strains of E. coli. Furthermore, there could be function-dependent biases for certain genes that increase the rate of Horizontal Gene Transfer, as those genes could possibly confer an advantage, whether it be in the ecology or pathogenicity of a microbe (13). Analysis of Epsilon-proteobacterial genomes has shown that hydrothermal vent microbes can have very similar virulence genes as their human/animal pathogenic counterparts, but their ability to easily acquire new genes causes them to diverge based on selective pressures in the environment they inhabit (14).
Genes that are acquired by Horizontal Gene Transfer can be accumulated over time through insertions and substitutions that could increase the activity of a certain pathway or introduce new genes and/or pathways. This could possibly confer a fitness advantage to the microbe as it creates a niche, or increase its abundance within the community. However, this should come at the cost of other unused genes given that organisms that accept genes via Horizontal Gene Transfer generally have a smaller genome, which is the result of genes being lost (5). Furthermore, selecting for certain genes allows the genome to be more streamlined, especially useful in the ocean, where resources can be limiting in certain regions (5). Over time, when the new horizontally acquired genes accumulate, new pathways emerge in a community, and only the pathways that can survive the environmental perturbations that appear over time will be conserved and be distributed to other organisms in the community. These environmental perturbations can cause changes in larger scale biogeochemical cycles that will add up and cause genes to diverge over longer time scales (5).
The mechanisms that can be used by Horizontal Gene Transfer include transformation, conjugation and transduction, in which Mobile Genetic Elements (MGEs) help facilitate. MGEs can include plasmids, bacteriophages, transposons and integrative conjugative elements (ICEs) that can move DNA from one spot to another on the genome or to other organisms that not be closely related (6,15). Prokaryotes use an operon to organize the genes with a single promoter at the start, which are highly conserved. However, increasing the distance from the promoter increases the likelihood of gene insertion (6). This creates hotspots where new genes are more likely to be integrated to the genome whether it be through the introduction of prophages or the ICEs that can be several hundred base pairs long, and can be self managed using the genes encoded within them (6).
As genome lengths can vary drastically within the same species, this complicates the process of defining the criteria for a microbial species, even with a conserved set of genes within a species. It is thought that genome lengths can affect the number of MGEs and ICEs, increasing Horizontal Gene Transfer rates with a larger genome size (2). Furthermore, as larger genomes have more genes that originated from Horizontal Gene Transfer, this would also increase the degree of phylogenetic incongruence as there are now more foreign genes incorporated (2). Also, a larger, more complex genome can also be more modular, increasing organization and reducing the likelihood of deletions, leading to more successful Horizontal Gene Transfers and integration (2).
Phylogenetic incongruence from Horizontal Gene Transfer can also introduce a source of complexity when trying to determine species relationships on the phylogenetic tree, where a gene can jump from one branch/leaf to another, distorting the phylogenetic relationships on the tree. This can make it hard identifying species similarity, as similar species are now in a different branch on the tree where they share the same sequences (4). Although microbes are connected to each other on the branch of the phylogenetic tree, even if they have sequence similarity by virtue of being the same species, doesn’t not guarantee they would have the same metabolic pathways (4).
Given that Horizontal Gene Transfer can drastically alter the genetic makeup within different strains of a species, it may seem difficult and impossible to determine a single definition to differentiate species, especially in the context of microbial genomes. However, if a set of criteria defining what constitutes a species can be used to reliably determine the conserved genes across all strains within a species, then that could be used to clearly define and differentiate microbial species. As long as two organisms share a certain degree of sequence homology that is common across all strains, then they can be classified as the same species, even being as low as 39.2% for E. coli (3). This would help with the classification of microbes into their species as it adds an extra filter, even Horizontal Gene Transfer increases the complexity of the microbial genome.
In conclusion, Horizontal Gene Transfer is one of the main drivers behind evolution, constantly changing microbial genomes through evolutionary selective pressures that can introduce new pathways or conserve existing ones. This also has the power to overhaul the genome sizes of certain strains by way of the insertion of MGEs and/or ICEs into the genome at higher frequencies certain hotspots, as well as the deletion of unused genes (6). This all decreases the sequence similarity thresholds for microbial species as native sequences are replaced by foreign ones inserted by Horizontal Gene Transfer and increases the within-species variability. With a higher genome complexity and within-species variability also comes with higher uncertainty with phylogenetic relations as phylogenetic incongruence increases. Therefore, an universally accepted standard for differentiating microbes at the species level should be the aim, where an exact percentage threshold is not used. This would make it easier to form phylogenetic relations between strains and have a solid definition to base future research on. Also, further research can be done on how genome size, in conjunction with gene function, affects the rate of Horizontal Gene Transfer within a community (2), so we can better understand how they are selected based on their biogeochemical activity in the environment.
References:
Welch RA, Burland V, Plunket III G, Redford P, Roesch P, Rasko D, Buckles EL, Liou SR, Boutin A, Hackett J, Stroud D, Mayhew GF, Rose DJ, Zhou S, Schwartz DC, Perna NT, Mobley HLT, Donnenberg MS, Blattner FR. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Nat Acad Sci U S A. 99(26):17020-17024. PMC139262
Haya Abuzuluf
Jack Anthony
Judy Ban
Ryan Nah
Ryan Lou
Sawera Dhaliwal
Water samples from various depths of Saanich Inlet, a model ecosystem for the effects of growing oxygen minimum zones in the open ocean, were analyzed via 16S iTag amplicon sequencing and processed using both mothur and QIIME2 independently. The correlation between changes in abundance of Sulfurimonas, a genus encompassing several species of sulfur-oxidizing Epsilonproteobacteria, with oxygen, sulfide, and nitrate concentrations was determined by use of a linear regression model in R. Statistically significant associations with likely biological relevance were discovered using mothur but not for QIIME2. Statistically significant correlations between individual OTUs and ASVs with various nutrient concentrations were similarly discovered, with more identified by mothur than QIIME2. Sequence processing with mothur and QIIME2 arrived at very different conclusions, suggesting that the two fundamentally different analysis philosophies lead to very different results. However, the statistical and methodological weaknesses of this study do not enable strong claims for or against the use of either analysis pipeline. We discuss these downfalls and possible directions for future work in this area to build on our results.
Saanich Inlet is a seasonally anoxic fjord located off the southeastern coast of Vancouver Island, British Columbia [1]. During fall and winter, strong winds mix water from the Strait of Georgia into the inlet and replace bottom water [2]. As the weather gets calmer in spring, deep water in the inlet is retained by a shallow sill near the inlet mouth and increasing stratification. High levels of primary productivity export organic matter from the euphotic zone leading to oxygen loss with depth, caused by microbial remineralization of said organic matter. This organic matter fueled respiration is sufficient to create hypoxic conditions below 100m and anoxic conditions below 150m [1]. Saanich Inletâs predictable, recurring anoxia presents an excellent model exosystem for the study of processes occurring in other anoxic marine environments all around the world.
Organic carbon is used by heterotrophic bacteria as an energy source in aerobic respiration. As oxygen availability drops, methane, ammonia, and hydrogen sulfide build up as the products of anoxic metabolism [1]. Globally, this process has the effect of creating bands of hypoxic water between 100m and 1000m in the open ocean. These oxygen minimum zones (OMZs) are expanding in all oceans and are expected to continue expansion as a consequence of anthropogenic climate change [3]. In Saanich Inlet, the seasonal flushing of new water into the deep removes the buildup of metabolites and refreshes the supply of oxygen each winter. The predictable and recurring nature of the physical and chemical cycling of Saanich Inlet makes it an ideal model ecosystem to study the biological processes that occur in oxygen minimum zones (OMZs) globally. The species present in a particular water sample are largely influenced by the availability of specific terminal electron acceptors (TEAs). Saanich Inlet offers an annually reset gradient of changing TEAs with depth, making it an excellent system to study the change of community structure in relation to depth or oxygen.
Operational taxonomic units (OTUs) are a representation of taxonomic grouping used for both metagenomic and 16S amplicon sequencing data, where genetic sequences which fall within some limit of similarity are clustered together into an OTU [4]. While this serves to overcome the problem of sequencing error causing the true sequence to be masked by grouping of the similar sequences, if the similarity cutoff is not sufficiently stringent, multiple species may be clustered together. Similarly, if it is too strict, a single species may be split up into multiple OTUs. Additionally, as OTUs represent a cluster of a number of distinct sequences generated for a specific data set, OTU abundances are not comparable across studies and datasets. Despite these issues, for the purposes of microbial community analysis, OTUs generated at the 3% sequence similarity cutoff have long been the de facto proxy for species identity.
Recently, an alternative to OTUs has been presented; namely, the amplicon sequence variant, or ASV [5]. ASVs are determined under the idea that the true sequence would appear more commonly than would an erroneous sequence, and thus only the sequences thought to be real sequences in this way are retained, disregarding all erroneous sequences. While this leads to discarding large amounts of sequence information, the resulting true sequences are comparable between data sets as a specific genetic sequence is associated with any given ASV. Though ASVs represent an actual sequence independent of a given dataset which is arguably superior to that of an OTU definition, whether their use confers advantages over OTUs has not been conclusively determined presently.
Sulfurimonas was selected as our taxon of interest due to its unusual metabolic pathways and the effects said pathways have on the environment. This bacterium can have either spiral or curved rod shaped cells and has one or two flagella for motility. These chemolithoautotrophs reduce nitrate and nitrite using sulfur compounds or H2 as electron sources [6]. Such a coupling of nitrogen and sulfur cycles in a single bacterium suggests a fascinating interplay between abundance and TEA availability. Sulfurimonas would be expected to increase in abundance with nitrate and oxidized sulfur concentrations, as well as decrease in abundance in the presence of oxygen, as it may be outcompeted by more organisms that find the surface environmental reduction potential more favourable. However, interactions with other species in the anoxic region, combined with limiting nitrate gradients near the bottom may cause the abundance to peak at intermediate depths, before declining in the deepest part of the water column.
The environmental samples used in this project were collected through time-series monitoring in Saanich Inlet on a monthly-basis aboard the MSV John Strickland at station S3 (48°35.500 N, 123°30.300 W) [1]. Samples for large volume (LV) SSU rRNA gene tags, metagenomics, metatranscriptomics, and metaproteomics were taken from six major depths spanning the oxycline (10, 100, 120, 135, 150, 165, and 200m). Large volume waters were collected in 2x12 1 Go-Flow bottles on a wire and gathered into 2L Nalgene bottles with sterile silicon tubing immediately following sampling for dissolved gases to minimise changes in microbial gene expression. A 0.22 μm Sterivex filter was used to collect biomass from collected water samples. Genomic DNA was then extracted from these filters and used to generate small subunit ribosomal RNA (SSU or 16S/18S rRNA) gene pyrotag libraries. PCR amplification targeting the V4-V5 region of SSU rRNA gene was performed to generate iTag datasets or amplicons. Samples were sequenced according to the standard operating protocol on an Illumina MiSeq platform at the JGI with 2x300bp technology. Using as consistent parameters, sequences were processed through both mothur [7] and QIIME2 [8]. Two phyloseq objects resulted from the processing which were then used in subsequent analyses.
Analysis was completed in R v3.4.3 [9] using the following packages.
library("tidyverse")
library("phyloseq")
library("vegan")
library("corrplot")
## Warning: package 'corrplot' was built under R version 3.4.4
## corrplot 0.84 loaded
Sample data from mothur and QIIME2 were organized and normalized using data.frame() function to make tabular data, where cases were represented in rows and measurements in columns. Relative abundances of our taxa of interest were compared across nutrient gradients using a linear regression model. Linear models describe a continuous response variable as a function of one or more predictor variable [10]. They were used in our analyses since each of the nutrient concentrations is a continuous variable with order.
Figure 1. Nutrient concentration plotted with depth
Overall microbial community structure at the phylum as determined by mothur in terms of taxonomic breakdown changes slightly from 10m to 100m, then is relatively consistent throughout all the lower depths (Fig. 2). Most depths are dominated by a large population of Proteobacteria and Bacteroidetes, the relative population of the latter giving way to the Proteobacteria with depth. Similar observations are seen in the QIIME2-processed data (Fig. 3).
Figure 2. Relative abundance of phyla present in Saanich Inlet based on sample depth as determined by mothur.
Figure 3. Relative abundance of phyla present in Saanich Inlet based on sample depth as determined by QIIME2.
In the mothur-processed data, Chao1 richness indicates that the number of identified OTUs increases until a peak at 100m, then decreasing to the deepest parts of the inlet, starting at 120m with a small increase at 200m (Fig. 4). Species diversity, as measured by the Inverse Simpson index, also peaks at 100m then decreases with increased depth. As oxygen concentrations decreases with depth in Saanich Inlet (Fig. 1), a higher oxygen concentration correlates with both increased richness and diversity in the sample.
## Warning in (function (..., row.names = NULL, check.rows = FALSE,
## check.names = TRUE, : row names were found from a short variable and have
## been discarded
## Warning in (function (..., row.names = NULL, check.rows = FALSE,
## check.names = TRUE, : row names were found from a short variable and have
## been discarded
## Warning: Removed 7 rows containing missing values (geom_errorbar).
Figure 4. Alpha Diversity Indicators - Chao1 and Inverse Simpson Index plotted against depth and coloured to compare with [O2] from mothur-processed data.
As for the QIIME2-processed data, Chao1 richness peaks at 120m where as before it decreases with increasing depth, increasing slightly at 200m (Fig. 5). Diversity, on the other hand, peaks at 10m, which then decreases monotonically with depth. Diversity and richness appear less strongly correlated in the QIIME2 dataset. However, the overall trends in richness and diversity of Saanich Inlet are observable regardless of analysis pipeline.
## Warning in (function (..., row.names = NULL, check.rows = FALSE,
## check.names = TRUE, : row names were found from a short variable and have
## been discarded
## Warning in (function (..., row.names = NULL, check.rows = FALSE,
## check.names = TRUE, : row names were found from a short variable and have
## been discarded
## Warning: Removed 7 rows containing missing values (geom_errorbar).
Figure 5. Alpha Diversity Indicators - Chao1 and Inverse Simpson Index plotted against depth and coloured to compare with [O2] from QIIME2-processed data.
In the mothur-processed data, correlation matrix analysis indicates that Sulfurimonas abundance appears to increase with depth, decrease with oxygen and nitrate concentration, as well as increase with sulfide concentration (Fig. 6). Indeed, linear regression of nutrient concentration and Sulfurimonas relative abundance indicate that the concentrations of each of the 3 nutrients significantly relate to Sulfurimonas relative abundance (Fig. 8; p < 0.01 after FDR-correction; Table 1). This overall pattern is observable in the QIIME2-processed data as well, while the correlations differ in strength (Fig. 7) and are not statistically significant when multiple linear regression is performed (Fig. 9; p > 0.05 after FDR-correction; Table 2).
Figure 6. Correlation matrix of nutrient concentration and Sulfurimonas abundance as determined with mothur.
Figure 7. Correlation matrix of nutrient concentration and Sulfurimonas abundance as determined with QIIME2.
| Estimate | Std..Error | t.value | p.value | FDR-corrected | |
|---|---|---|---|---|---|
| (Intercept) | 0.0224803 | 0.0022788 | 9.865069 | 0.0022148 | 0.0044296 |
| O2_uM | -0.0000980 | 0.0000156 | -6.286284 | 0.0081297 | 0.0081297 |
| H2S_uM | 0.0023563 | 0.0002058 | 11.449147 | 0.0014300 | 0.0044296 |
| NO3_uM | -0.0007633 | 0.0001212 | -6.300053 | 0.0080795 | 0.0081297 |
| Estimate | Std..Error | t.value | p.value | FDR-corrected | |
|---|---|---|---|---|---|
| (Intercept) | 0.0809162 | 0.0241215 | 3.354527 | 0.0439111 | 0.1756445 |
| O2_uM | -0.0003616 | 0.0001650 | -2.192073 | 0.1160289 | 0.2320579 |
| H2S_uM | 0.0029215 | 0.0021785 | 1.341047 | 0.2723990 | 0.2723990 |
| NO3_uM | -0.0020102 | 0.0012825 | -1.567472 | 0.2149981 | 0.2723990 |
Figure 8. Relative abundance of Sulfurimonas plotted against nutrient concentration from the mothur-processed dataset.
Figure 9. Relative abundance of Sulfurimonas plotted against nutrient concentration from the QIIME2-processed dataset.
Mothur identified eight OTUs classified as Sulfurimonas, all of which were only identified in the samples at depths below 100m (Fig. 10). As at the taxon level, the abundances of most, of these OTUs increase with depth, and all but one were present at the deepest depth. Similarly, QIIME2 identified seven Sulfurimonas ASVs (Fig. 11), meaning that the richness determined by the two analysis pipelines is roughly comparable. However, fewer ASVs were shown to increase in abundance with depth, and just under half were identified in the 200m sample.
Figure 10. Relative abundances of each Sulfurimonas OTU as determined by mothur, recolored by oxygen concentration at the given depth of the sample.
Figure 11. Relative abundances of each Sulfurimonas ASV as determined by QIIME2, recolored by oxygen concentration at the given depth of the sample.
Multiple linear regression of the mothur-processed data indicated that the abundances of 7 of the 8 identified OTUs significantly correlated following false discovery rate correction with one or more of the tested nutrients (NO3-, H2S, and O2), most commonly sulfide (Table 3). A representative OTU is shown in Figure 12, OTU0308, which showed a significant correlation with all 3 tested nutrient variables. In comparison, QIIME2-processed data showed no significant correlations with either oxygen nor nitrate concentration, and the abundances of only 3 of the 7 identified ASVs significantly correlated with sulfide concentration (Table 4). A representative ASV is shown in Figure 13, ASV1250, which showed a correlation of abundance with sulfide concentration.
| OTU | Variable | Estimate | Std_Error | t_value | p_value | FDR_corrected |
|---|---|---|---|---|---|---|
| Otu0308 | O2 | -0.0000601 | 0.0000080 | -7.4950313 | 0.0049204 | 0.0168698 |
| Otu0308 | H2S | 0.0013338 | 0.0001058 | 12.6015568 | 0.0010776 | 0.0113253 |
| Otu0308 | NO3 | -0.0004953 | 0.0000623 | -7.9496787 | 0.0041516 | 0.0166066 |
| Otu0666 | O2 | -0.0000203 | 0.0000046 | -4.4026333 | 0.0217284 | 0.0474074 |
| Otu0666 | H2S | 0.0000584 | 0.0000609 | 0.9589865 | 0.4083113 | 0.4666414 |
| Otu0666 | NO3 | -0.0001611 | 0.0000358 | -4.4974784 | 0.0205213 | 0.0474074 |
| Otu0704 | O2 | 0.0000061 | 0.0000063 | 0.9602731 | 0.4077575 | 0.4666414 |
| Otu0704 | H2S | 0.0008052 | 0.0000834 | 9.6541059 | 0.0023594 | 0.0113253 |
| Otu0704 | NO3 | 0.0000623 | 0.0000491 | 1.2682777 | 0.2941783 | 0.4412674 |
| Otu0751 | O2 | -0.0000269 | 0.0000055 | -4.8516438 | 0.0167137 | 0.0445698 |
| Otu0751 | H2S | -0.0002618 | 0.0000731 | -3.5797332 | 0.0372935 | 0.0745869 |
| Otu0751 | NO3 | -0.0002206 | 0.0000430 | -5.1242045 | 0.0143892 | 0.0431675 |
| Otu1315 | O2 | 0.0000022 | 0.0000023 | 0.9602731 | 0.4077575 | 0.4666414 |
| Otu1315 | H2S | 0.0002899 | 0.0000300 | 9.6541059 | 0.0023594 | 0.0113253 |
| Otu1315 | NO3 | 0.0000224 | 0.0000177 | 1.2682777 | 0.2941783 | 0.4412674 |
| Otu2793 | O2 | 0.0000005 | 0.0000005 | 0.9602731 | 0.4077575 | 0.4666414 |
| Otu2793 | H2S | 0.0000644 | 0.0000067 | 9.6541059 | 0.0023594 | 0.0113253 |
| Otu2793 | NO3 | 0.0000050 | 0.0000039 | 1.2682777 | 0.2941783 | 0.4412674 |
| Otu3512 | O2 | 0.0000000 | 0.0000033 | 0.0119645 | 0.9912051 | 0.9912051 |
| Otu3512 | H2S | 0.0000020 | 0.0000439 | 0.0449804 | 0.9669496 | 0.9912051 |
| Otu3512 | NO3 | 0.0000191 | 0.0000258 | 0.7396834 | 0.5131159 | 0.5597628 |
| Otu3610 | O2 | 0.0000005 | 0.0000005 | 0.9602731 | 0.4077575 | 0.4666414 |
| Otu3610 | H2S | 0.0000644 | 0.0000067 | 9.6541059 | 0.0023594 | 0.0113253 |
| Otu3610 | NO3 | 0.0000050 | 0.0000039 | 1.2682777 | 0.2941783 | 0.4412674 |
Figure 12. Relative abundance of OTU0308 versus nutrient concentration, points recolored by depth.
| ASV | Variable | Estimate | Std_Error | t_value | p_value | FDR_corrected |
|---|---|---|---|---|---|---|
| Asv277 | O2 | -0.0003010 | 0.0001398 | -2.1533547 | 0.1203267 | 0.5053720 |
| Asv277 | H2S | -0.0040259 | 0.0018458 | -2.1810766 | 0.1172306 | 0.5053720 |
| Asv277 | NO3 | -0.0019546 | 0.0010866 | -1.7987740 | 0.1698884 | 0.5351818 |
| Asv561 | O2 | 0.0000006 | 0.0000519 | 0.0119645 | 0.9912051 | 0.9912051 |
| Asv561 | H2S | 0.0000308 | 0.0006857 | 0.0449804 | 0.9669496 | 0.9912051 |
| Asv561 | NO3 | 0.0002986 | 0.0004037 | 0.7396834 | 0.5131159 | 0.5671281 |
| Asv578 | O2 | 0.0000150 | 0.0000156 | 0.9602731 | 0.4077575 | 0.5351818 |
| Asv578 | H2S | 0.0019946 | 0.0002066 | 9.6541059 | 0.0023594 | 0.0165161 |
| Asv578 | NO3 | 0.0001543 | 0.0001216 | 1.2682777 | 0.2941783 | 0.5351818 |
| Asv1153 | O2 | 0.0000333 | 0.0000347 | 0.9602731 | 0.4077575 | 0.5351818 |
| Asv1153 | H2S | 0.0044271 | 0.0004586 | 9.6541059 | 0.0023594 | 0.0165161 |
| Asv1153 | NO3 | 0.0003424 | 0.0002700 | 1.2682777 | 0.2941783 | 0.5351818 |
| Asv1250 | O2 | 0.0000154 | 0.0000160 | 0.9602731 | 0.4077575 | 0.5351818 |
| Asv1250 | H2S | 0.0020433 | 0.0002116 | 9.6541059 | 0.0023594 | 0.0165161 |
| Asv1250 | NO3 | 0.0001580 | 0.0001246 | 1.2682777 | 0.2941783 | 0.5351818 |
| Asv1620 | O2 | -0.0000548 | 0.0000487 | -1.1252336 | 0.3423877 | 0.5351818 |
| Asv1620 | H2S | -0.0007710 | 0.0006433 | -1.1986359 | 0.3167203 | 0.5351818 |
| Asv1620 | NO3 | -0.0002880 | 0.0003787 | -0.7606489 | 0.5021883 | 0.5671281 |
| Asv2216 | O2 | -0.0000702 | 0.0000731 | -0.9602731 | 0.4077575 | 0.5351818 |
| Asv2216 | H2S | -0.0007774 | 0.0009655 | -0.8051467 | 0.4796349 | 0.5671281 |
| Asv2216 | NO3 | -0.0007209 | 0.0005684 | -1.2682777 | 0.2941783 | 0.5351818 |
Figure 13. Relative abundance of ASV1250 versus nutrient concentration, points recolored by depth.
Thus, while the overall trends in abundance across depth, [H2S], [NO3-], and [O2] appear to be similar for mothur- and QIIME2-processed data, many more relationships were found to be statistically significant in the mothur-processed dataset, both at the relative genus level abundances as well as at the OTU/ASV level. The abundances appear to match the fact that the Sulfurimonas genus tends to inhabit anoxic/sulfidogenic regions of marine basins, including oxic-anoxic interfaces and hydrothermal vents [13].
Within the Sulfurimonas genus the richness determined by mothur and QIIME2 processed data was highly comparable. Both mothur and QIIME2 pipeline processed data appeared to reveal the same overall trends of community structure change with respect to nutrient concentration, though the proportion of differences which were statistically significant was far lower in the QIIME2 data. In general, diversity relative to richness was found to rise to a peak, after which it begins to decrease with increasing depth (Figures 4-5). Sulfurimonas abundance increases with increasing depth and sulfide concentration, and decreases with increasing oxygen and nitrate concentration (Figures 6-9).
Most of the trends observed in this study were consistent with our hypotheses. As a chemolithoautotroph, it is expected that Sulfurimonas be most abundant in deep, anoxic waters. Sulfurimonas metabolizes by reducing nitrate and nitrite and oxidizing sulfur containing compounds or hydrogen [5]. Thus, as hypothesized, Sulfurimonas relative abundance was significantly decreased in the presence of oxygen, where it will likely be outcompeted by organisms who can use oxygen as an electron donor. In the same way, Sulfurimonas relative abundance was significantly increased in the presence of sulfide. However, it was unexpected that Sulfurimonas abundance was significantly negatively associated with nitrate concentration. It may be that organisms which utilize nitrate more efficiently are outcompeting Sulfurimonas as at the depths where nitrate is high, the concentration of sulfide is low (Fig. 1). Potentially this association may be caused by other confounding environmental factors that were studied in this analysis such as temperature or microbial metabolites produced by specific microbes adapted to those particular anoxic regions of the water column. Regardless, this observation was contrary to what was expected, and no single explanation can be concluded based on the available data. Evidently, however, the abundance of Sulfurimonas in Saanich Inlet is likely to vary as a function of many nutrient concentrations, not just oxygen.
There appears to be a correlation between stratified layers of the water column and Sulfurimonas distribution. Samples used in this study were taken at a time of year when stratification due to a thermocline was present around 10m-100m. The temperature at 10m is substantially higher than lower depths. An increase in diversity was also observed from the surface to a peak at 10m. It could be hypothesized that the high diversity is due to ambient temperatures being ideal for bacterial growth with minimal limitation. In this study, it was also found that diversity tends to decrease with richness at lower depths, with minimums coinciding with the boundary of the thermocline. The low diversity at depth could also be explained by the high specificity required to thrive in such extreme environments. For instance, our taxon of interest, Sulfurimonas, is commonly found near deep sea hydrothermal vents and functions primarily by sulfide oxidation, a process which is favored by the high concentrations of sulfide in that habitat.
The overall trend of the results of this experiment align with our hypotheses and previous literature [6], and while the conducted statistical tests show that some of the correlations we expected to see are significant, a number of methodological issues make us less confident in the replicability of our results. The use of a linear regression model assumes that the relationships between the tested variables would be linear, which is highly unlikely to be true in a complex biological system. As an example, interspecies competition may make intermediary concentrations of a nutrient such as nitrate more beneficial, as at higher concentrations perhaps denitrifying bacteria would outcompete Sulfurimonas, and it is only at intermediate concentrations that it can find its niche. However, should the concentration be too low, Sulfurimonas would be unable to grow either. Thus, given that there will always be other confounding environmental variables not taken into account in the model, the apparent success of the model is more surprising than not. Future investigations into the subject should consider employing stronger statistical tests which more accurately measure the phenomenon being examined.
We are unable to draw any definitive conclusions regarding which pipeline used in this study is strictly superior. However, clear differences were observed in the results obtained from each, specifically in regards to the Sulfurimonas genus. Both suggested similar overall relationships between nutrient concentration and Sulfurimonas abundance. However, whether the identified relationships were statistically significant proved to be very different between the two analysis pipelines. Due to the aforementioned flaws in the statistical methodology used, we cannot confidently assert whether use of one pipeline is better than the other, but it is clear that the two produce different results, and choice of pipeline is extremely important for microbial ecology. It should be noted that had multiple-comparisons testing not been controlled for, far more test conditions would have yielded significance. This highlights the importance of ensuring statistical rigour of the experimental methodology, as well as the benefit of consulting a statistician during the experimental design process.
As each pipeline approaches the most important analytical step of sequences processing differently, each has its own guidelines and configuration profiles. Hence, choosing the correct pipeline with a set of parameters and algorithms for a given application is important. While for this study the parameters and databases were kept as similar as possible between the two, there are fundamental differences in the philosophy by which the two pipelines handle data. As mothur clusters individual sequences while QIIME2 groups by sample consistency, studies have shown that the effect of sequencing errors yielded a bigger impact on the results than choosing the appropriate gene region for amplification [11]. In addition, selecting a pipeline with higher sequence throughput could increase the chance of richness overestimation [11]. Because the analytical steps are paramount to microbial ecology research and discovery, variations in the quality of databases and their annotations could impact the validity of research results. Especially for clustering-first pipelines such as mothur and QIIME2, the choice of the reference database in terms of comprehensiveness and sensitivity has implications in the accuracy of microbial abundance estimation [12]. A standardized evaluation protocol may be beneficial to overcome the dilemma of pipeline selection. Regardless of the pipeline chosen, it is important going forward to ensure pipeline methodology is described completely, with every decision within the pipeline well-justified.
To build on the results of this study, future work analysing the role of each of the many environmental variables on abundance of Sulfurimonas by a stronger multivariate statistical test would be beneficial. Additionally, further analyses conducting processing with mothur and QIIME2 in parallel would enable stronger claims to be made regarding the relative ability of the two analysis pipelines, and more generally for OTU- and ASV-based analysis pipelines.
Torres-Beltran M, Hawley AK, Capelle D, Zaikova E, Walsh DA, Mueller A, Finke J. 2017. A compendium of geochemical information from the Saanich Inlet water column. Sci Data. 4:170159. doi:10.1038/sdata.2017.159.
Ocean Networks Canada. 2013. Introduction to Saanich Inlet. Retrieved from http://www.oceannetworks.ca/introduction-saanich-inlet.
Stramma L, Schmidtko S, Levin LA, Johnson GC. 2010. Ocean oxygen minima expansions and their biological impacts. Deep Sea Res Part 1 Oceanogr Res Pap. 57(4):587-595. doi: 10.1016/j.dsr.2010.01.005
Wooley, J. C., Godzik, A., & Friedberg, I. (2010). A primer on metagenomics. PLoS Computational Biology, 6(2), e1000667. doi:10.1371/journal.pcbi.1000667
Callahan, B. J., Mcmurdie, P. J., & Holmes, S. P. (2017). Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME Journal, 11(12), 2639. doi:10.1038/ismej.2017.119
Labrenz M, Grote J, Mammitzsch K, Boschker HTS, Laue M, Jost G, Glaubitz S, Jürgens K. 2013. Sulfurimonas gotlandica sp. nov., a chemoautotrophic and psychrotolerant epsilonproteobacterium isolated from a pelagic redoxcline, and an emended description of the genus Sulfurimonas. Int J Syst Evol Microbiol. 63(Pt 11):4141-4148. doi:10.1099/ijs.0.048827-0.
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF. 2009. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75:7537-7541. doi:10.1128/AEM.01541-09
Ortutay C, Ortutay Z. 2017. Introduction to R statistical environment, p 1-15. In Molecular Data Analysis, John Wiley & Sons, Hoboken, New Jersey, USA.
Bingham NH, Fry JM, & SpringerLink ebooks - Mathematics and Statistics. (2010). Regression: Linear models in statistics. London; New York: Springer.
Siegwald L, Touzet H, Lemoine Y, Hot D, Audebert C, Caboche S. 2017. Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics. PLoS One. 12(1):e0169563. doi:10.1371/journal.pone.0169563.
Plummer E, Twin J, Bulach DM, Garland SM, Tabrizi SN. 2015. A Comparison of Three Bioinformatics Pipelines for the Analysis of Preterm Gut Microbiota using 16S rRNA Gene Sequencing Data. J Proteomics Bioinform. 8:283-291.
Sievert SM, Scott KM, Klotz MG, Chain PSG, Hauser LJ, Hemp J, Hagler M, Land M, Lapidus A, Larimer FW, Lucas S, Malfatti SA, Meyer F, Paulsen IT, Ren Q, Simon K, the USF Genomics Class. 2008. Genome of the Epsilonproteobacterial Chemolithoautotroph Sulfurimonas denitrificans. Appl Environ Microbiol. 74(4):1145-1156. doi:10.1128/AEM.01844-07.
None
Proofread for any errors.
None
None
Proofread for any errors.
None
Proofread for any errors.
Helped analyze the results.
Compiled all the figures for the document and edited titles/axis labels if necessary.
Wrote the entire results section based on the figures that were generated, minus any edits that other group members contributed in the final draft.
Helped edit this section with the suggestions provided.
None
None
Proofread for any errors and formatted any of the references.
Sequences for nitrite reductase (nirS) gene were obtained from various depths of Saanich Inlet, a model ecosystem for the effects of growing oxygen minimum zones in the open ocean. Tree-based Sensitive and Accurate Protein Profiler (TreeSAPP) was implemented for automated reconstruction of the nitrogen cycle along defined redox gradients in Saanich Inlet for metagenomic and metatranscriptomic assemblies of nirS. Increased DNA and RNA abundance of nirS was found with increasing depth under the suboxic and anoxic conditions, and no expression was observed at oxic conditions. Gammaproteobacteria, Planctomycetia, Deltaproteobacteria and Alphaproteobacteria groups predominante the overall nirS expression in the Saanich Inlet. It was found that the distribution of nirS throughout the water column may be due to Planctomycetia expressing anammox-type nirS that thrives in lower suboxic zones, and Gammaproteobacteria expressing denitrification-type nirS which is highly suitable for denitrification in anoxic zones and upper suboxic zones. In addition, nirS OTUs have been shown to be station and time-dependent. As the Saanich Inlet is a seasonally anoxic fjord, the DNA and RNA abundances for denitrification genes should fluctuate with time as the water column moves in and out of anoxia. In this project, we investigate the nirS gene in the denitrification pathway to assess in terms of taxonomic rank, abundance and expression along the redoxcline in Saanich Inlet and propose future directions to expand on our results.
Saanich Inlet acts as an excellent model system for oxygen minimum zones (OMZs), due to its seasonally anoxic character. A shallow sill separates the body of the inlet from the Strait of Georgia, and traps deep water throughout the spring and summer. Microbial activity quickly depletes dissolved oxygen below 100m, when large amounts of organic matter sink during the spring phytoplankton bloom. Then, during turbulent winter weather, the mixed layer depth increases and the anoxic deep water is exchanged for new, oxygen rich water over the shallow sill [1]. These processes repeat annually, resulting in a cycle of deep water replenishment, followed by the development of an anoxic region. The predictable nature of changing oxygen concentrations, combined with excellent sampling conditions make Saanich Inlet a first-rate model OMZ ecosystem.
The expansion of OMZs is expected to cause drastic shifts in global marine microbial community compositions, and therefore their functional capabilities. There is a demand for models of how OMZs form due to a global increase in OMZ size and intensity [2]. OMZs are bands of hypoxic water between 100m and 1000m in depth. Microbial respiration fueled by sinking organic matter from the euphotic zone depletes dissolved oxygen all the way to anoxia in extreme circumstances. The strength and depth range of these zones has been shown to be increasing on average due to anthropogenic climate change. Oxygen is not only a vital nutrient for macroscopic life, but also a determining factor in how microbial communities function. When oxygen is not available, bacteria can reduce other chemicals for energy production. In the absence of oxygen, nitrate is often the first choice.
Nitrogen is cycled biologically on an immense scale globally. It is a necessary component of all life in relatively large amounts. For practical reasons, nitrogen is often divided into two main pools: fixed nitrogen, which is soluble in water and bioavailable, and unfixed nitrogen, which is in a gaseous form and diffuses to the atmosphere. Cycling between these two pools occurs by three main processes. Nitrogen fixation is the conversion of N2 gas to NH4+ by diazotrophs. In the ocean, diazotrophs are mostly cyanobacteria, which require high light and iron availability. This limits nitrogen fixation to relatively small portions of the surface ocean. The two processes that counter nitrogen fixation, releasing fixed nitrogen back to the atmosphere, are annamox and denitrification. Annamox converts NO3- and NH4+ to N2 but is thought to contribute less to bioavailable nitrogen loss than denitrification. Denitrification is the stepwise reduction of NO3- to N2. Both annamox and denitrification require extremely low O2 environments and are prevalent in oceanic OMZs.
The ability of a community to denitrify is dependant on the genetic presence of a full nitrogen reduction pathway. Often, microbes will not contain the genes or transcribe the genes necessary for all intermediate steps of a metabolic pathway, instead opting to rely on other organisms for the processing of the steps they lack [3]. NirS is of particular interest because it catalyzes the conversion of nitrite (NO2-) to nitric oxide (NO) (Fig 1), the point of no return in the denitrification pathway where a bioavailable ion is converted to a gas. Here the distribution of nirS genes and transcripts across depths and across taxa is analysed in a representative summer Saanich Inlet water column.
Figure 1. Nitrogen Cycle
Environmental samples were collected through time-series monitoring in Saanich Inlet on a monthly-basis aboard the MSV John Strickland at station S3 (48°35.500 N, 123°30.300 W) [4, 5]. Samples for large volume (LV) SSU rRNA gene tags, metagenomics, metatranscriptomics, and metaproteomics were taken from six depths spanning the oxycline (10, 100, 120, 135, 150, 165, and 200âm). Samples for high-resolution (HR) SSU rRNA gene tag sequencing were taken at 16 depths along the oxycline (10, 20, 40, 60, 75, 80, 90, 97, 100, 110, 120, 135, 150, 165, 185 and 200âmeters). Water was collected with Go-Flo bottles and dissolved gas samples were taken immediately after the bottles were taken onboard the ship. Next, the remaining water was quickly drained into 2L Nalgene bottles through sterilized silicon tubing to minimise changes in microbial gene expression.A 0.22 μm Sterivex filter was used to collect biomass from water samples. Genomic DNA and RNA were then extracted from cells on these filters. RNA was reverse transcribed into cDNA and both genomic DNA and cDNA were used in the construction of shotgun Illumina libraries. Sequencing data were generated on the Illumina HiSeq platform with 2x150bp technology.
The resulting reads were processed and quality controlled at the Joint Genome Institute using the IMG/M pipeline. Metagenomes were assembled and processed using MetaPathways 2.5 at UBC.
Tree-based Sensitive and Accurate Protein Profiler (TreeSAPP) was implemented for automated reconstruction of the nitrogen cycle along defined redox gradients in Saanich Inlet using Google Cloud services for metagemomic (MetG, DNA) and metatranscriptomic (MetaT, RNA) assemblies of nirS.
Reads per kilo base per million mapped reads (RPKM) was used to quantify gene expression by normalizing for total read length and the number of sequencing reads when comparing the gene coverage values. This corrects the difference in sampling sequencing depth and gene length [6]. The formula below indicates numReads as the number of reads mapped to a gene sequence, geneLength as the length of the gene sequence and totalNumReads as the total number of mapped reads for a sample:
\[RPKM = \frac{numReads}{\frac{geneLength}{10^3}\times \frac{totalNumReads}{10^6}}\]
The following commands were used in the pipeline:
#!/bin/bash
cd /home/enenn/
export WISECONFIGDIR=/home/connor/TreeSAPP//data//genewise_support_files//wisecfg
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_10m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaG_reads/SI072_LV_10m.anqdp.fastq.gz -o treesapp/treesapp_out_G_10m
rm treesapp/treesapp_out_G_10m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_G_10m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_10m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaT_reads/SI072_LV_10m.qtrim.3ptrim.artifact.rRNA.clean.fastq.gz \
-o treesapp/treesapp_out_T_10m
rm treesapp/treesapp_out_T_10m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_T_10m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_100m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaG_reads/SI072_LV_100m.anqdp.fastq.gz -o treesapp/treesapp_out_G_100m
rm treesapp/treesapp_out_G_100m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_G_100m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_100m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaT_reads/SI072_LV_100m.qtrim.3ptrim.artifact.rRNA.clean.fastq.gz \
-o treesapp/treesapp_out_T_100m
rm treesapp/treesapp_out_T_100m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_T_100m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_120m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaG_reads/SI072_LV_120m.anqdp.fastq.gz -o treesapp/treesapp_out_G_120m
rm treesapp/treesapp_out_G_120m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_G_120m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_120m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaT_reads/SI072_LV_120m.qtrim.3ptrim.artifact.rRNA.clean.fastq.gz \
-o treesapp/treesapp_out_T_120m
rm treesapp/treesapp_out_T_120m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_T_120m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_135m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaG_reads/SI072_LV_135m.anqdp.fastq.gz -o treesapp/treesapp_out_G_135m
rm treesapp/treesapp_out_G_135m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_G_135m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_135m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaT_reads/SI072_LV_135m.qtrim.3ptrim.artifact.rRNA.clean.fastq.gz \
-o treesapp/treesapp_out_T_135m
rm treesapp/treesapp_out_T_135m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_T_135m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_150m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaG_reads/SI072_LV_150m.anqdp.fastq.gz -o treesapp/treesapp_out_G_150m
rm treesapp/treesapp_out_G_150m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_G_150m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_150m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaT_reads/SI072_LV_150m.qtrim.3ptrim.artifact.rRNA.clean.fastq.gz \
-o treesapp/treesapp_out_T_150m
rm treesapp/treesapp_out_T_150m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_T_150m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_165m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaG_reads/SI072_LV_165m.anqdp.fastq.gz -o treesapp/treesapp_out_G_165m
rm treesapp/treesapp_out_G_165m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_G_165m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 -p pe \
-i bucket/MetaG_assemblies/SI072_LV_165m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaT_reads/SI072_LV_165m.qtrim.3ptrim.artifact.rRNA.clean.fastq.gz \
-o treesapp/treesapp_out_T_165m
rm treesapp/treesapp_out_T_165m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_T_165m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 \
-i bucket/MetaG_assemblies/SI072_LV_200m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaG_reads/SI072_LV_200m.anqdp.fastq.gz -o treesapp/treesapp_out_G_200m
rm treesapp/treesapp_out_G_200m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_G_200m 777 &
time treesapp.py -T 8 --verbose --delete -t D0302 \
-i bucket/MetaG_assemblies/SI072_LV_200m_DNA.scaffolds.fasta --rpkm \
-r bucket/MetaT_reads/SI072_LV_200m.qtrim.3ptrim.artifact.rRNA.clean.fastq.gz \
-o treesapp/treesapp_out_T_200m
rm treesapp/treesapp_out_T_200m/RPKM_outputs/*.sam &
chmod -Rf treesapp/treesapp_out_T_200m 777
Interactive Tree of Life (iTOL) version 4.2 was used for the display, annotation and management of the phylogenetic trees produced for the nirS gene within the denitrification pathway [7].
DNA and RNA abundances of nirS were compared across depth and different taxa. Analysis was completed in R v3.4.3 [8] using the following packages and the TreeSAPP output was loaded into R and combined into a single dataframe, nirS.all using the following commands:
library(tidyverse)
library(cowplot)
## Warning: package 'cowplot' was built under R version 3.4.4
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
##
## ggsave
#DNA data
nirS.DNA.10m = read_tsv("nirS_DNA_10m_marker_contig_map.tsv") %>%
select(Tax.DNA.10 = Confident_Taxonomy, Abund.DNA.10 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_character(),
## Likelihood = col_character(),
## WTD = col_character()
## )
nirS.DNA.100m = read_tsv("nirS_DNA_100m_marker_contig_map.tsv") %>%
select(Tax.DNA.100 = Confident_Taxonomy, Abund.DNA.100 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.DNA.120m = read_tsv("nirS_DNA_120m_marker_contig_map.tsv") %>%
select(Tax.DNA.120 = Confident_Taxonomy, Abund.DNA.120 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.DNA.135m = read_tsv("nirS_DNA_135m_marker_contig_map.tsv") %>%
select(Tax.DNA.135 = Confident_Taxonomy, Abund.DNA.135 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.DNA.150m = read_tsv("nirS_DNA_150m_marker_contig_map.tsv") %>%
select(Tax.DNA.150 = Confident_Taxonomy, Abund.DNA.150 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.DNA.165m = read_tsv("nirS_DNA_165m_marker_contig_map.tsv") %>%
select(Tax.DNA.165 = Confident_Taxonomy, Abund.DNA.165 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.DNA.200m = read_tsv("nirS_DNA_200m_marker_contig_map.tsv") %>%
select(Tax.DNA.200 = Confident_Taxonomy, Abund.DNA.200 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
#RNA data
nirS.RNA.10m = read_tsv("nirS_RNA_10m_marker_contig_map.tsv") %>%
select(Tax.RNA.10 = Confident_Taxonomy, Abund.RNA.10 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_character(),
## Likelihood = col_character(),
## WTD = col_character()
## )
nirS.RNA.100m = read_tsv("nirS_RNA_100m_marker_contig_map.tsv") %>%
select(Tax.RNA.100 = Confident_Taxonomy, Abund.RNA.100 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.RNA.120m = read_tsv("nirS_RNA_120m_marker_contig_map.tsv") %>%
select(Tax.RNA.120 = Confident_Taxonomy, Abund.RNA.120 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.RNA.135m = read_tsv("nirS_RNA_135m_marker_contig_map.tsv") %>%
select(Tax.RNA.135 = Confident_Taxonomy, Abund.RNA.135 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.RNA.150m = read_tsv("nirS_RNA_150m_marker_contig_map.tsv") %>%
select(Tax.RNA.150 = Confident_Taxonomy, Abund.RNA.150 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.RNA.165m = read_tsv("nirS_RNA_165m_marker_contig_map.tsv") %>%
select(Tax.RNA.165 = Confident_Taxonomy, Abund.RNA.165 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
nirS.RNA.200m = read_tsv("nirS_RNA_200m_marker_contig_map.tsv") %>%
select(Tax.RNA.200 = Confident_Taxonomy, Abund.RNA.200 = Abundance, Query)
## Parsed with column specification:
## cols(
## Query = col_character(),
## Marker = col_character(),
## Taxonomy = col_character(),
## Confident_Taxonomy = col_character(),
## Abundance = col_double(),
## Likelihood = col_double(),
## WTD = col_double()
## )
#Combine all data frames
nirS.all = nirS.DNA.100m %>%
full_join(nirS.DNA.120m, by = "Query") %>%
full_join(nirS.DNA.135m, by = "Query") %>%
full_join(nirS.DNA.150m, by = "Query") %>%
full_join(nirS.DNA.165m, by = "Query") %>%
full_join(nirS.DNA.200m, by = "Query") %>%
full_join(nirS.RNA.100m, by = "Query") %>%
full_join(nirS.RNA.120m, by = "Query") %>%
full_join(nirS.RNA.135m, by = "Query") %>%
full_join(nirS.RNA.150m, by = "Query") %>%
full_join(nirS.RNA.165m, by = "Query") %>%
full_join(nirS.RNA.200m, by = "Query") %>%
mutate(Taxonomy = ifelse(!is.na(Tax.DNA.100), Tax.DNA.100,
ifelse(!is.na(Tax.DNA.120), Tax.DNA.120,
ifelse(!is.na(Tax.DNA.135), Tax.DNA.135,
ifelse(!is.na(Tax.DNA.150), Tax.DNA.150,
ifelse(!is.na(Tax.DNA.165), Tax.DNA.165,
ifelse(!is.na(Tax.DNA.200), Tax.DNA.200,
ifelse(!is.na(Tax.RNA.100), Tax.RNA.100,
ifelse(!is.na(Tax.RNA.120), Tax.RNA.120,
ifelse(!is.na(Tax.RNA.135), Tax.RNA.135,
ifelse(!is.na(Tax.RNA.150), Tax.RNA.150,
ifelse(!is.na(Tax.RNA.165), Tax.RNA.165,
ifelse(!is.na(Tax.RNA.200), Tax.RNA.200,
"unclassified"))))))))))))) %>%
select(-starts_with("Tax.")) %>%
gather("Key", "Abundance", starts_with("Abund")) %>%
separate(Key, c("Key","Type","Depth_m"), by = ".") %>%
select(Depth_m, Type, Abundance, Taxonomy, Query) %>%
mutate(Depth_m = as.numeric(Depth_m)) %>%
separate(Taxonomy, into = c("Domain", "Phylum", "Class", "Order",
"Family","Genus", "Species"), sep=";")
## Warning: Using ... for passing arguments to `strsplit()` is defunct
## Warning: Too few values at 1428 locations: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
## 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...
Figure 2. DNA vs. RNA relative abundance of nirS gene and transcripts from 10 to 200 metres depth in Saanich Inlet.
As shown in Figure 2, the DNA and RNA abundances of the nirS gene vary significantly by depth, with the general trend being that abundance increases with depth below 100m. Above 100m, the only depth that was measured was 10m, and there was neither DNA nor RNA found for the nirS gene. The abundance of DNA is higher than that of RNA, down to 150m, where RNA abundance increases significantly by about one order of magnitude from 82 to 750. This means that transcription of nitrite reductase, nirS, increases under anoxic conditions, which start around 150m. RNA abundance reaches a maximum at 165m, but the DNA abundance decreases by a sizeable amount at this depth, which could mean that there is lower gene diversity, but high rates of transcription in each organism, leading to more RNA. DNA abundance reaches a maximum at 200m, and RNA abundance decreases slightly, which means that there could be more diversity, but lower transcription rates for each organism. Overall, RNA abundances are significantly higher than DNA abundance below 150m, but above this depth, DNA abundances are higher, meaning that the nirS gene is present in cells, but it is not transcribed at high rates.
## Warning: Removed 1205 rows containing missing values (geom_point).
Figure 3. A: Change in nitrite, nitrate, and oxygen concentrations (Log10 scale) from 10 to 200 metres depth in Saanich Inlet. B: Change in DNA vs. RNA abundance of the nirS gene from 10 to 200 metres depth across different Orders in Saanich Inlet. Note that the Log10 was used for concentration as the nitrite concentration was relatively low compared to nitrate and oxygen.
## Warning: Removed 1205 rows containing missing values (geom_point).
Figure 4. A: Change in nitrite, nitrate, and oxygen concentrations (Log10 scale) from 10 to 200 metres depth in Saanich Inlet. B: Change in DNA vs. RNA abundance of the nirS gene from 10 to 200 metres depth across different Classes in Saanich Inlet. Note that the Log10 was used for concentration as the nitrite concentration was relatively low compared to nitrate and oxygen.
## Warning: Removed 1205 rows containing missing values (geom_point).
Figure 5. A: Change in nitrite, nitrate, and oxygen concentrations (Log10 scale) from 10 to 200 metres depth in Saanich Inlet. B: Change in DNA vs. RNA abundance of the nirS gene from 10 to 200 metres depth across different Families in Saanich Inlet. Note that the Log10 was used for concentration as the nitrite concentration was relatively low compared to nitrate and oxygen.
Figure 6. Relative abundances of Class at various depths in Saanich Inlet. The most abundant class with 90%-100% relative abundance at all depth are Acidlmicroblia, and Actinobacteria, The intermediate abundance of class between 53%-85% relative abundance are mainly unclassified Archaea, Deltaproteobacteria, Epsilonproteobacteria, Gammaproteobacteria. The Phyla with the lowest relative abundances with approximately 2%-20% are mainly Marinimicrobia, Planctomycetes, Varrucomicrobia and unclassified Proteobacteria.
Figure 7. nirS relative DNA and RNA expression in 100m, 120m, 135m, 150m, 165m, 200m depth relative to taxa. Taxonomic classification of organisms are divided by different colours, shown in a wheel at the end of the tree branch. The green and red bars extending out from the organism names depict relative DNA and RNA abundances of nirS gene, respectively. 10m is not shown as there were no DNA or RNA expression of nirS for all analyzed organisms.
As shown in Figures 4 and 7, diversity of organisms containing the nirS gene varies with depth. Gammaproteobacteria genera Colwellia, Oleispira, Olephilus, Thalassomonas, Ridgeia and Sedimenticola have consistent relative level of DNA expression compared to other organisms at all analyzed depths. At 100m, the level of RNA transcription follows the trend of DNA expression in members of the Gammaproteobacteria, Planctomycetia and Deltaproteobacteria class. At 120m, the RNA transcripts of nirS for Gammaproteobacteria are less than DNA. However, the inverse is true for some members of the Alphaproteobacteria, Anaerolineae, Planctomycetia, Deltaproteobacteria, Deinococci, Bacteroidetes, Aquificae, Epsilonproteibacteria, Aciditiobacillia classes, in which the RNA transcription level is much higher than DNA expression. Of note, Planctomycemia have the highest RNA abundance. At 135m, the same trend in relative DNA and RNA is observed as at 120m. At 150m, the O2 concentration decreases to zero, there is high nirS expression in Gammaproteobacteria, and medium to low expression in Alphaproteobacteria. RNA levels follows the trend of DNA expression. At 165m, NO3 concentration also decreases to zero. NirS expression decreases in Alphaproteobacteria, but slightly increases in Planctomycetia and Deltaproteobacteria. RNA abundances still follows the trend of DNA. At 200m, the O2, NO3 and NO2 concentrations are zero. RNA is lower than DNA in all previously mentioned Gammaproteobacteria species except for Olephilus. However, RNA abundance is slightly higher than DNA in the Anaerolineae class. We must mention that the systemic nirS expression observed in all species at 135m, 150m and 200m is an artifact. There is an organism identified at a higher taxonomic level with nirS expression, but the pipeline cannot differentiate it further. Thus, nirS expression was conferred to all species under that taxonomic level.
Denitrification is a sequential, four-step process, nitrate (NO3-), a fixed inorganic species, is reduced to dinitrogen gas (N2). This process only occurs under suboxic and anoxic conditions (1-2% oxygen saturation), where the most favourable terminal electron acceptor (TEA), oxygen (O2) is limiting and organisms take up the next favourable TEA, NO3- [9,10]. Our gene of interest, nirS; nitrite reductase, is involved in the reduction of nitrite (NO2-) to nitric oxide (NO) in the second step of denitrification and also contains cytochrome cd1 [11, 12]. Within the water column of Saanich Inlet, dissolved oxygen concentrations decrease steadily with depth until 150m, where values quickly approach zero (Figures 3A, 4A, 5A.). This results to a sharp increase in DNA and RNA abundance, consistent with the fact that denitrification happens under suboxic and anoxic conditions (Figures 3A, 4A, 5A.).
This nitrite reduction process is shared with another nitrite reductase, nirK, which contains copper instead of the cytochrome cd1 used in nirS [11]. Gene analyses from experiments done in other other anoxic/suboxic basins, such as the Baltic Sea, Black Sea and the San Francisco Bay, have shown that gene abundances were about 1-2 orders of magnitude higher for nirS than in nirK in the San Francisco Bay [13]. OTU abundances based on a 95% sequence similarity cutoff for all experiments were also higher for nirS compared to nirK by up to five times at certain stations in the Baltic Sea and the San Francisco Bay [13,14]. However, in the Black Sea, the OTU abundance of nirS is either approximately the same, or up to five times lower than the abundance of nirK [15]. Alpha-diversity and richness of nirS and nirK OTUs are largely dependent on the station and time of year, which is seen in all three experiments [13,14,15].
Our TreeSAPP analysis showed that the Gammaproteobacteria, Planctomycetia, Deltaproteobacteria and Alphaproteobacteria groups predominante the overall nirS expression in the Saanich Inlet. There are two noteworthy points to highlight from the results. First, we observed that the Gammaproteobacteria is present in the Saanich Inlet at relatively high abundances of 50-85% at all depths (Figure 6.), which corresponds to also having the highest relative abundance of nirS gene copies at all analyzed depths (Figure 7.). This observation suggests a high level of denitrifying metabolic activity within this group. Studies with pure cultures and seawater enrichments have shown that the Gammaproteobacteria may maintain a higher ribosome content per cell [16,17], with which the increased rRNA levels may permit these bacteria to better respond to rapid changes in oxygen concentration in the seasonally anoxic Saanich Inlet [18]. Secondly, despite Gammaproteobacteria exhibiting relatively high level of nirS DNA, lower RNA abundance was detected at 120m and 135m, where non-Proteobacteria groups such as Planctomycetia dominate (Figure 7.). It is also worthy to note that Planctomycetia composes less than 10% of overall organism abundance at these depths (Figure 6.), but disproportionality accounts for a large portion of nirS expression (Figure 7.). Planctomycetia is known to mainly carry out anammox metabolism [19]. Gene-sequencing studies investigating expression of anammox sequences have shown that nirS genes expressed consistently at lower suboxic zones, such as 120m and 135m in Saanich, are closely allied to the environmental anammox clade [20]. The presence of anammox bacteria in the lower suboxic zone may be due to the flux of ammonium from the sulfide zone that affects these depths [20].
Meanwhile, Gammaproteobacteria may express nirS that is more closely related to denitrification pathways, which studies have shown to be more abundant in anoxic zones, and also in upper suboxic zones (O2 > 10 µM) [20]. While anoxic environments are the most suitable for denitrification, studies have also suspected that the enzymatic machinery in Alpha- and Gammaproteobacteria for anaerobic denitrification is likely to be similar to aerobic denitrification, allowing possible denitrification in upper suboxic zones [21]. In other words, the distribution of nirS throughout the water column may be due to Planctomycetia expressing anammox-type nirS that thrives in lower suboxic zones, and Gammaproteobacteria expressing denitrification-type nirS which is highly suitable for denitrification in anoxic zones and upper suboxic zones.
In this investigation, we have seen the abundance and expression of nirS vary with taxa and location. This is one of a number of genes involved in the distributed metabolism of the N cycle. It is likely advantageous for the genes in the N cycle to be specialized in order to interact with the many forms nitrogen can adopt. It is also advantageous to distribute these genes across specific taxa in the community since this increases the adaptability of microbial communities in response to their ecosystem. In addition, transitions in the oxidation state of nitrogen occur as a result of thermodynamically driven microbial reactions. It would be energetically irrational for an organism inhabiting anoxic regions of the ocean to actively express genes involved in nitrification. Microbial communities involved in the N cycle have evolved in symbiosis to achieve energetically efficient niches. For example, consider the diazotrophic cyanobacterium UCYN-A. UCYN-A is lacking photosystem II, RubisCo and the tricarboxylic acid cycle, meaning it is not capable of fixing carbon. Instead, it has formed a symbiotic relationship with a prymnesiophyte where UCYN-A receives fixed carbon in exchange for nitrogen [22]. Until the identification of UCYN-A, nitrogen fixation in the oceans had been attributed mainly to Trichodesmium [23]. Further investigation will likely yield even more discoveries indicating the vastness of the N cycle. Ecological studies of microbial community dynamics could provide further insight into the functioning N cycle.
It should be noted that some genes in the nitrogen cycle have similar functions. For example, the nirK gene also performs nitrite reduction, like nirS. As alpha-diversity and richness of nirS and nirK OTUs have been shown to be station and time-dependent [13,14,15], this should also apply to Saanich Inlet, as it is a seasonally anoxic fjord, which can affect the distribution of denitrification pathways at different parts of the year as oxygen concentrations in the water column change. As a result, DNA and RNA abundances for denitrification genes should fluctuate with time as the water column moves in and out of anoxia. Accordingly, an avenue for future research would be to compare nirS abundance to nirK abundance, by obtaining the nirK dataset using TreeSAPP.
Another method of microbial NO2- loss in the water column is through anaerobic ammonium oxidation (anammox), which converts either NO2- or NH4+ to dinitrogen gas (N2) using either the Hao (hydroxilamine reductase) or Hzo (hydroxylamine oxidoreductase) gene [12]. Autotrophic bacteria have been shown to undergo anammox [24], meaning that they could potentially have a selective advantage where organic carbon flux from the surface is limited, given that they can produce their own organic carbon. In addition, there are also bacteria that can use an alternative organic carbon source, propionate, for growth [25]. Although bacteria that undergo anammox are slow-growing [26], these alternative source of organic carbon, means that they can potentially have a competitive advantage over nitrite reducers in certain areas of the ocean. Although our analysis is primarily focused on denitrification pathways, comparing nirS and nirK abundance with Hao and Hzo abundances would help determine if annamox has an effect on denitrification rates in Saanich Inlet.
None
Proofread for any errors.
None
None
Proofread for any errors.
None
Proofread for any errors.
Helped analyze the results.
Compiled all the figures in R for the document and edited titles/axis labels if necessary.
Wrote the first paragraph of the section.
Helped edit this section with the suggestions provided.
I did research for the first 2 and last 2 paragraphs
I wrote the first 2 and last 2 paragraphs.
Proofread for any errors and formatted any of the references.